Country and the city: Danish voters and party preferences

Task 1: Get spatial data for municipalities in Denmark

You can download administrative data for Denmark from the GADM dataset, the Global administrative boundaries, hosted by UCDavis. You do this by using the getData() function in the raster package. For GADM data, you need to specify what level of admin boundaries you wish to download (0=country, 1=first level subdivision aka regions, 2=second level aka municipalities, etc.). Read this blog on the power of raster package when it comes to available datasets.

Instructions:

  • Use getData() function and level = 2 to download the boundaries of Danish kommunes.
  • Note the class of the object and variable called NAME_2, it’s the name of the municipality
  • Convert the Spatial dataframe to sf object and project UTM32
  • Use mapview or tmap library to see a map with all the municipalities.
# Load the spatial data, project to UTM mun_sp<- getData('GADM', country = 'DK',
# level = 2)
mun_sp <- readRDS("data/gadm36_DNK_2_sp.rds")
mun_sf <- st_as_sf(mun_sp)
mun <- st_transform(mun_sf, crs = 32632)
mapview(mun)
# Straighten the names
sort(mun$NAME_2)
 [1] "Aabenraa"          "Aalborg"           "Ærø"              
 [4] "Albertslund"       "Allerød"           "Århus"            
 [7] "Assens"            "Ballerup"          "Billund"          
[10] "Bornholm"          "Brøndby"           "Brønderslev"      
[13] "Christiansø"       "Dragør"            "Egedal"           
[16] "Esbjerg"           "Faaborg-Midtfyn"   "Fanø"             
[19] "Favrskov"          "Faxe"              "Fredensborg"      
[22] "Fredericia"        "Frederiksberg"     "Frederikshavn"    
[25] "Frederikssund"     "Furesø"            "Gentofte"         
[28] "Gladsaxe"          "Glostrup"          "Greve"            
[31] "Gribskov"          "Guldborgsund"      "Haderslev"        
[34] "Halsnæs"           "Hedensted"         "Helsingør"        
[37] "Herlev"            "Herning"           "Hillerød"         
[40] "Hjørring"          "Høje Taastrup"     "Holbæk"           
[43] "Holstebro"         "Horsens"           "Hørsholm"         
[46] "Hvidovre"          "Ikast-Brande"      "Ishøj"            
[49] "Jammerbugt"        "Kalundborg"        "Kerteminde"       
[52] "København"         "Køge"              "Kolding"          
[55] "Læsø"              "Langeland"         "Lejre"            
[58] "Lemvig"            "Lolland"           "Lyngby-Taarbæk"   
[61] "Mariagerfjord"     "Middelfart"        "Morsø"            
[64] "Næstved"           "Norddjurs"         "Nordfyns"         
[67] "Nyborg"            "Odder"             "Odense"           
[70] "Odsherred"         "Randers"           "Rebild"           
[73] "Ringkøbing-Skjern" "Ringsted"          "Rødovre"          
 [ reached getOption("max.print") -- omitted 24 entries ]
which(grepl("Å", mun$NAME_2))
[1] 31
which(grepl("Vest", mun$NAME_2))
[1] 60
mun$NAME_2[31] <- "Aarhus"
mun$NAME_2[21] <- "Høje-Taastrup"
mun$NAME_2[60] <- "Vesthimmerlands"

Task 2: Wrangle the attributes and join to the spatial data

In order to show something we need to connect the spatial polygons with some attributes. Let’s pick the civil status table from Denmark Statistik and calculate the total numbers of men and women in each municipality and calculate the percentages for single men and women so that the singles know where to go to find a significant other :).

Here we get to practice basic tidyverse functions and the use of tmap package.

# Load the attributes
elec <- read_sheet("https://docs.google.com/spreadsheets/d/1ty3UrUiCK2iqWVk2T2GORaCl0QZ6feUTMmTakXPVIIg/edit#gid=0", col_types = "ccnnn")
write_csv(elec, "data/elections.csv")

# Check names
sort(unique(elec$Region))
 [1] "Aabenraa"          "Aalborg"           "Aarhus"           
 [4] "Ærø"               "Albertslund"       "Allerød"          
 [7] "Assens"            "Ballerup"          "Billund"          
[10] "Bornholm"          "Brøndby"           "Brønderslev"      
[13] "Christiansø"       "Dragør"            "Egedal"           
[16] "Esbjerg"           "Faaborg-Midtfyn"   "Fanø"             
[19] "Favrskov"          "Faxe"              "Fredensborg"      
[22] "Fredericia"        "Frederiksberg"     "Frederikshavn"    
[25] "Frederikssund"     "Furesø"            "Gentofte"         
[28] "Gladsaxe"          "Glostrup"          "Greve"            
[31] "Gribskov"          "Guldborgsund"      "Haderslev"        
[34] "Halsnæs"           "Hedensted"         "Helsingør"        
[37] "Herlev"            "Herning"           "Hillerød"         
[40] "Hjørring"          "Høje-Taastrup"     "Holbæk"           
[43] "Holstebro"         "Horsens"           "Hørsholm"         
[46] "Hvidovre"          "Ikast-Brande"      "Ishøj"            
[49] "Jammerbugt"        "Kalundborg"        "Kerteminde"       
[52] "København"         "Køge"              "Kolding"          
[55] "Læsø"              "Langeland"         "Lejre"            
[58] "Lemvig"            "Lolland"           "Lyngby-Taarbæk"   
[61] "Mariagerfjord"     "Middelfart"        "Morsø"            
[64] "Næstved"           "Norddjurs"         "Nordfyns"         
[67] "Nyborg"            "Odder"             "Odense"           
[70] "Odsherred"         "Randers"           "Rebild"           
[73] "Ringkøbing-Skjern" "Ringsted"          "Rødovre"          
 [ reached getOption("max.print") -- omitted 24 entries ]
# Check out the data
elec %>% 
  group_by(Party) %>% 
  summarize(sum2011 = sum(Y2011),
            sum2015 = sum(Y2015),
            sum2019 = sum(Y2019))  %>%
  janitor::adorn_totals(where = "row")
                           Party sum2011 sum2015 sum2019
            A. Socialdemokratiet  879615  924940  914882
             B. Radikale Venstre  336698  161009  304714
  C. Det Konservative Folkeparti  175047  118003  233865
 F. SF - Socialistisk Folkeparti  326192  147578  272304
             O. Dansk Folkeparti  436726  741746  308513
                           Total 2154278 2093276 2034278
# Create total electorate per municipality
electorate <- elec %>% 
  group_by(Region) %>% 
  summarize(sum2011 = sum(Y2011),
            sum2015 = sum(Y2015),
            sum2019 = sum(Y2019)) 

# Merge the summary with the granual election dataset and spatial polygons
elections <- mun %>% 
  select(NAME_2) %>% 
  merge(elec, by.x = "NAME_2",by.y ="Region") %>% 
  merge(electorate, by.x = "NAME_2",by.y ="Region") %>% 
  group_by(NAME_2, Party) %>% 
  mutate(pct_vote2011 = Y2011/sum2011*100,
         pct_vote2015 = Y2015/sum2015*100,
         pct_vote2019 = Y2019/sum2019*100)
elections
Simple feature collection with 495 features and 11 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
projected CRS:  WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups:   NAME_2, Party [495]
   NAME_2 Party Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
 * <chr>  <chr> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
 1 Aaben~ F. S~  2188   823  1449   20338   21886   18441
 2 Aaben~ A. S~  9088  7857  8858   20338   21886   18441
 3 Aaben~ C. D~  1259   639  1403   20338   21886   18441
 4 Aaben~ B. R~  2092  1175  1655   20338   21886   18441
 5 Aaben~ O. D~  5711 11392  5076   20338   21886   18441
 6 Aalbo~ C. D~  6893  2938  5930   84968   77489   79345
 7 Aalbo~ F. S~ 12910  5172  8952   84968   77489   79345
 8 Aalbo~ B. R~ 11258  5702  9800   84968   77489   79345
 9 Aalbo~ O. D~ 13549 23294  9652   84968   77489   79345
10 Aalbo~ A. S~ 40358 40383 45011   84968   77489   79345
# ... with 485 more rows, and 4 more variables: geometry <MULTIPOLYGON [m]>,
#   pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>
# Map some aspect of the result to see no counties are missing
elections %>% 
  group_by(NAME_2) %>% 
  filter(grepl("^A", Party)) %>%  # A.Socialdemokratie
  select(pct_vote2015) %>% 
  mapview()
# Save for later?
write_rds(elections, "data/elections_sp.rds") 

Task 3: Look at some of the data

# Let's map the two most popular parties, SD and Danske Folkeparti through time
library(tmap)
elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party", ncol = 2) + 
    tm_polygons("pct_vote2011", title = "Percentage of Votes")

elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party") + 
    tm_polygons("pct_vote2015", title = "Percentage of Votes")

elections %>% filter(grepl("^A|^O", Party)) %>% tm_shape() + tm_facets("Party") + 
    tm_polygons("pct_vote2019", title = "Percentage of Votes")

Task 4: Cartogram

As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.

One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.

The cartogram package contains functions for creating cartograms. You give it a spatial data frame and the name of a column, and you get back a similar data frame but with regions distorted so that the region area is proportional to the column value of the regions.

You’ll also use the sf package for computing the areas of newly generated regions with the st_area() function.

Instructions

The elections sf object should be already loaded in your environment.

  • Load the cartogram and rgeos packages.
  • Plot total electorate against area for each region. Deviation from a straight line shows the degree of misrepresentation.
  • Create a cartogram scaling to the “pct_vote2011” column.
  • Check that the DF voter population is proportional to the area.
  • Plot the “pct_vote2011” percentage on the cartogram. Notice how some areas have relatively shrunk or grown.

Solution

 [1] "NAME_2"       "Party"        "Y2011"        "Y2015"        "Y2019"       
 [6] "sum2011"      "sum2015"      "sum2019"      "geometry"     "pct_vote2011"
[11] "pct_vote2015" "pct_vote2019"

Copacetic cartogram! Now try to rerun the cartogram for the Social Democrats in 2015

# Let's look at Social Democrats in 2015
DKSD <- elections %>% filter(grepl("^A", Party))

# Make a cartogram, scaling the area to the total number of votes cast in 2015
electorate2015 <- cartogram_cont(DKSD, "sum2015")

# Now check the linearity of the total voters per municipality plot
plot(electorate2015$sum2015, st_area(electorate2015, byid = TRUE))

# Make a cartogram, scaling the area to the percentage of SD voters
DF2015 <- cartogram_cont(DKSD, "pct_vote2015")

# Check the linearity of the SD voters percentage per municipality plot
plot(DF2015$pct_vote2015, st_area(DF2015, byid = TRUE))

# Make a adjusted map of the 2015 SD voters
plot(electorate2015$geometry, col = "beige", main = "Electorate in DK municipalities 2015")

plot(DF2015$geometry, col = "pink", main = "% of Social Democrat votes across DK in 2015")

Task 5: Spatial autocorrelation test

If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø. How much correlation is there, really? By correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and the chances are they’ll be more similar than any two random boroughs. This can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.

The spdep package has functions for measures of spatial correlation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.

In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.

Instructions I - defining neighbors

  • Load the elections spatial dataset with attributes
  • Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
  • Load the spdep library and create nb object of neighbors using queen adjacency
  • Pass elections to poly2nb() to find the neighbors of each borough polygon. Assign to nb.
  • Get the center points of each borough by passing elections to st_centroid and then to st_coordinates(). Assign to mun_centers.
  • Update the basic map of the DK municipalities by adding the connections.
    • In the second plot call pass nb and mun_centers.
    • Also pass add = TRUE to add to the existing plot rather than starting a new one.
# Reload the data if needed elections <- readRDS('data/elections_sp.rds')
elections
Simple feature collection with 495 features and 11 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
projected CRS:  WGS 84 / UTM zone 32N
# A tibble: 495 x 12
# Groups:   NAME_2, Party [495]
   NAME_2 Party Y2011 Y2015 Y2019 sum2011 sum2015 sum2019
 * <chr>  <chr> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
 1 Aaben~ F. S~  2188   823  1449   20338   21886   18441
 2 Aaben~ A. S~  9088  7857  8858   20338   21886   18441
 3 Aaben~ C. D~  1259   639  1403   20338   21886   18441
 4 Aaben~ B. R~  2092  1175  1655   20338   21886   18441
 5 Aaben~ O. D~  5711 11392  5076   20338   21886   18441
 6 Aalbo~ C. D~  6893  2938  5930   84968   77489   79345
 7 Aalbo~ F. S~ 12910  5172  8952   84968   77489   79345
 8 Aalbo~ B. R~ 11258  5702  9800   84968   77489   79345
 9 Aalbo~ O. D~ 13549 23294  9652   84968   77489   79345
10 Aalbo~ A. S~ 40358 40383 45011   84968   77489   79345
# ... with 485 more rows, and 4 more variables: geometry <MULTIPOLYGON [m]>,
#   pct_vote2011 <dbl>, pct_vote2015 <dbl>, pct_vote2019 <dbl>
plot(elections$geometry)

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)

# Use the spdep package
library(spdep)

# Make neighbor list following queen adjacency
nb <- poly2nb(mun_sm$geometry)
nb
Neighbour list object:
Number of regions: 99 
Number of nonzero links: 364 
Percentage nonzero weights: 3.713907 
Average number of links: 3.676768 
8 regions with no links:
4 6 43 55 57 79 84 89
# Get center points of each municipality
mun_centers <- st_coordinates(st_centroid(mun_sm$geometry))

# Show the connections
plot(mun_sm$geometry)
plot(nb, mun_centers, col = "red", add = TRUE)

Instructions II - Moran’s I

Now that your neighbors are determined and centroids are computed, let’s continuing with the Moran’s I statistic

  • Create a subset with municipalities for O.Danske Folkeparti
  • Feed the pct_2011 vector into moran.test().
    • moran.test() needs a weighted version of the nb object which you get by calling nb2listw().
    • After you specify your neighbor nbobject (mun_nb) you should define the weights style = "W". Here, style = "W" indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors.
    • You will need another argument in both spatial weights and at the level of the test. zero.policy= TRUE deals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t include zero.policy= TRUE, you’ll get the following error
    • Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
  • Inspect a map of pct_2011.
  • Run another Moran I statistic test, this time on the percent of single women.
    • Use 999 Monte-Carlo iterations via moran.mc().
    • The first two arguments are the same as for moran.test().
    • You also need to pass the argument nsim = 999.
    • Note the p-value. Can you reject the null hypothesis this time?
# Let's look at Danske Folkeparti in 2015
DKF <- elections %>% filter(grepl("^O", Party))

# Run a Moran I test on total population of women
moran.test(DKF$pct_vote2015, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DKF$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.4073, p-value = 0.07967
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.100073843      -0.011111111       0.006242107 
# Run a Moran I test on the proportion of single women
moran.test(DKF$pct_vote2011, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DKF$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.0343, p-value = 0.1505
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.07087116       -0.01111111        0.00628223 
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DKF$pct_vote2015, nb2listw(nb, zero.policy = TRUE), zero.policy = TRUE, 
    nsim = 999)

    Monte-Carlo simulation of Moran I

data:  DKF$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.10007, observed rank = 935, p-value = 0.065
alternative hypothesis: greater

Marvellous Moran Testing! You should have found that the p-value was around 0.079 in 2015 and 0.15 in 2011 the first case, thus you did not find any significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.053, so you did find some not very significant spatial correlation (strongly positive).

Repeat the same test for Social Democrats

# Let's look at Social Democrats
DKSD <- elections %>% filter(grepl("^A", Party))

# Run a Moran I test on total population of women
moran.test(DKSD$pct_vote2015, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DKSD$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 1.0907, p-value = 0.1377
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.075419759      -0.011111111       0.006294259 
# Run a Moran I test on the proportion of single women
moran.test(DKSD$pct_vote2011, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DKSD$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  n reduced by no-neighbour observations
  

Moran I statistic standard deviate = 0.68026, p-value = 0.2482
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.042503887      -0.011111111       0.006211842 
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DKSD$pct_vote2011, nb2listw(nb, zero.policy = TRUE), zero.policy = TRUE, 
    nsim = 999)

    Monte-Carlo simulation of Moran I

data:  DKSD$pct_vote2011 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.042504, observed rank = 772, p-value = 0.228
alternative hypothesis: greater
# Do a Monte Carlo simulation to get a better p-value
moran.mc(DKSD$pct_vote2015, nb2listw(nb, zero.policy = TRUE), zero.policy = TRUE, 
    nsim = 999)

    Monte-Carlo simulation of Moran I

data:  DKSD$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.07542, observed rank = 871, p-value = 0.129
alternative hypothesis: greater

Phenomenal political testing. Social Democrats show even less correlation. P-value in Moran I test is was around 0.13 in 2011 results and 0.24 in 2015 results, thus no significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.24, suggesting there is insignificant (positive) spatial correlation.

Well-done! Not so much correlation as it might seem at the first sight.